Abstract
The purpose of this project is to analyze the release of an economic indicator as an event that can generate temporary patterns in the price reactions of a financial asset. These price reactions help to detect patterns and use them to build a trading system. For this specific case study, we would be using the unemployment rate as an economic indicator for the USDMXN currency.
How can an economic indicator influence the behaviour of a selected currency? This question will be answer in the following report, going from a contextualization and exploration of the obtained data, to creating a optimized trading system that will return an overall profit using the unemployment rate and the USDMXN price history to make decisions over the transactions made with the previously mentioned currency.
In the first section of this Market Microstructure and Trading Systems project consist of a historical analysis of both of the mentioned time series that lead to a conceptual definition of the indicator and an empirical strategy of capital management.
The second section of the project is divided in statistical and computational aspects. For the statistical analysis different tests are performed to give a wide comprehension of the unemployment rate. On the other hand, on computational terms, the subsection shows the juxtaposition between the USDMXN time series and the data corresponding to the unemployment rate, creating scenarios to co-relate the information. Furthermore, based on the already analyzed data, certain parameters will define based on specific metrics calculated for the currency behaviour everytime the indicator occurs, in other words, for a specific time lapse.
The following section contains the definition of the trading system. This proposed system will have as parameters Volume of trade, Take profit and Stop Loss. On a brief note, the trading system will make a transaction everytime the indicator is reported and wont closed the transaction (perfome the opposite operation) until the price touches a barrier define by the Take Profit and Stop Loss.
The final section will mainly contain the trading system optimization performed by a PSO Algorithm and its corresponding constrains over a rentability metric like the Sharpe Ratio. Once this has been optimized the perfomance will be presented with Performance Atribuition Metrics like Sharpe, Sortino and Treynor ratio.
Overall the purpose of this report is to include all the skills learned on this course and apply them to real life data. Along the notebook evidence of this learning process is presented.
In order to run this notebook, it is necessary to have installed and/or have the requirements.txt file with the following:
The following are the file dependencies that are needed to run this notebook:
%%capture
# Install all the pip packages in the requirements.txt
import sys
!{sys.executable} -m pip install -r requirements.txt
### Import libraries to use
import pandas as pd
import numpy as np
from statsmodels.graphics.gofplots import qqplot
from pyswarm import pso
import pyswarms as ps
import warnings
### Import scripts
import functions as fn
import visualizations as vis
import data as dt
usdmxn_18 = pd.read_csv('files\MP_M1_2018.csv')
usdmxn_19 = pd.read_csv('files\MP_M1_2019.csv')
usdmxn_20 = pd.read_csv('files\MP_M1_2020.csv')
unemployment = pd.read_excel('files/Unemployment_Rate.xlsx')
The currency data contains the information related to the MXNUSD result of a concatenation of 3 csv files. Each file contains one year data (2018,2019,2020) of prices for continuous future contracts. The source of this data is unknown since it was given as class material.
The Unemployment rate data comes in a excel file. In this case, the information corresponds to the same 3 year period obtained for the currency data. The source of this material is Factset.
The chosen indicator to analyze its impact on the behaviour of the previously stated currency is the unemployment rate. The unemployment rate can be interpreted as the porcentage of the unemployed people corresponding to the total labor force. In other words, this rate represents the proportion of the labor force that is not currently working and are actively searching for a job.
The next formula demonstrates how the indicator is obtain. It is the result of the number of people without a job divided by the people whether employed or unemployed contained in the working age population category. For the US this refers to people aged between 15 and 64. Finally to get the indicator on a percentage unit it only takes multiplying the result of the division times 100.
$$Unemployment \ Rate = \frac{Number\ of \ people \ unemployed}{Total \ labor \ force} * \left( 100 \right )$$This indicator is reported monthly by the U.S. Bureau of Labor Statistics as a result of applying Labor Force Surveys. On this project the U-3 unemployment rate will be used as a reference, however the Bureau of Labor Statistis presents various results for the unmeployment rate taking different factors into consideration like the types of jobs the employed people have or how long they unemployed have been jobless.
Since the data has been gathered for the interest variables, now will be shown visual validations for whenever the indicator is announced. The purpose of doing this is to create empirical strategies for different scenarios to influence the trading system that will be defined in following paragraphs.
This validations consist of 3 functions. The first function will return a dataframe to validate, the second one is a visualization of the behaviour of the closing prices. The last function creates a summary of the strategy on therms of profit and loss.
The third function, also called empiric_trade, receives as an argument the dataframe to validate. Once the function gets the data, it takes the price of our currency at the same timestamp the indicator was reported. It also defines a subset with aproximately thirty minutes after the previous mentioned timestamp. Once this subset has been created, is possible to defined the required variables from the known information.
Primarly for the direction, all it takes is to substract the first price from the last price. If the answer is positive this translates on the closing price being greater than the opening price (when the indicator was announced), if it is negative, the opposite can be infered. On this reasoning path, since the trading decisions occured every time the indicator occurs, the direction can lead to a decision making reasoning. If the price has a rising tendency, a buy transaction can be expected to be fulfill. On the other hand, with a decreasing tendency, a sell transacion will be performed.
The next parameter to be set is the volume of transaction, this will be simply defined by the median value of the volume contained in the data frame, making this a changing parameter for each dataframe with a statistical justification.
Both Take Profit and Stop Loss are defined on pip units, the function defines ranges between 0 and the maximum or minimum price variation represented in pips, being define by the transaction. For a buy operation, it wil show losses it the price continuos to go down because we payed a higher price for the traded asset. This means that the Take Profit barrier wil be a random number between 0 and the maximun price variation. Similarly, the Stop Loss barrier will be defined in a range between 0 and the minimun price variation.
Finally to get the general profit and loss it only takes to substract the transaction price from the corresponding barrier price, whether it is a Take profit or a Stop Loss.
For a sell scenario, a loss will be perceived if the price continuous to go up consequence of selling the asset for a cheaper price. The take profit and the stop loss ranges will be defined contrary to the way they are defined on the buy scenario. Adding to this modification, now that the utility reflects when the price goes down, to get the take profit barrier price, the take profit in pip units will be substrated to the price at the time of the indicatior release meanwhile the pips for the stop loss will be added to the previously stated price.
In this case, to get the general profit and loss monetary quantity, will be enough with as substraction of the barrier prices to the price of the asset at the indicator report date.
To exemplify the explain above, in the next chunks there will be five validations that will ilustrate this idea on a numerical level.
### Let's see docstring for data_manipulation function
help(fn.data_manipulation)
Help on function data_manipulation in module functions:
data_manipulation(forex_1, forex_2, forex_3, indicator)
This function creates a data frames for further use. It only accepts 3 year worth of data.
It requires information of a designated currency and a chosen indicator.
Parameters
----------
forex_1 : CSV File
The first year of currency data
forex_2 : CSV File
The second year of currency data
forex_3 : CSV File
The third year of currency data
indicator : CSV File
All three years worth of indicator data
help(fn.Event_Data)
Help on function Event_Data in module functions:
Event_Data(usdmxn, unemployment)
This function return a dataframe for each indicator event. This includes 30 minutes prior and 30 after the indicator has
been annouced.
Parameters
----------
usdmxn : Dataframe
All three years worth of forex data
data_validation = fn.data_manipulation(usdmxn_18,usdmxn_19,usdmxn_20,unemployment)
events = fn.Event_Data(data_validation[0],unemployment)
The first validation will be done using the dataframe shown below:
help(fn.validation)
Help on function validation in module functions:
validation(data, n_val)
This function returns a dataframe for a visual and empirical validation.
Parameters
----------
data : Dataframe
Dataframe to validate
n_val: numeric
Defines the number of validation
Return
----------
Trial : Dataframe
val_1 = fn.validation(events,1)
help(vis.val_graph)
Help on function val_graph in module visualizations:
val_graph(df)
This function plots the closing price time series. It also includes a line where the indicator was announced
Parameters
----------
df : Dataframe
Dataframe to validate
Return
----------
fig : Plot
For a visual interpretation of the price behavior, the val_graph function displays a plot of the price series and adds a line when the indicator was announced. In this case, the prices tend to continue to go down after the indicator was reported. Considering that the transaction should be done on the date the indicator report, the advise would be to sell
vis.val_graph(val_1)
The empirical design of the strategy states that the operation to perform is to "sell" given that the direction is negative. For the Takeprofit pip variation will be on terms of the minimum varition meanwhile the Stoploss pip variation will be defined on terms of the maximun variation. On the profit and loss columns, a monetary result is shown. When the price touches the takeprofit barrier the expected utility would be \$32265.66. On the other hand the loss if the price reaches the Stop Loss barrier would be -\$23092.09.
fn.empiric_trade(val_1)
| Operation | Direction | Volume | Takeprofit(pip) | Stoploss(pip) | Profit($) | Loss($) | |
|---|---|---|---|---|---|---|---|
| Operation | Sell | -1.0 | 632660.0 | 510 | 365 | 32265.66 | -23092.09 |
The second validation will be done using the datadrame shown below:
val_2 = fn.validation(events,2)
In this second validation, even though the price has a tendency to decrease, at the end of the analyzed time period the price starts increasing. Visually it can be stated that the direction of the transaction will be positive given that the last price is higher than the price cut by the indicator line.
vis.val_graph(val_2)
The empirical design of the strategy states that the operation to perform is to "buy" given that the direction is positive. For the Takeprofit pip variation will be on terms of the maximum varition meanwhile the Stoploss pip variation will be defined on terms of the minimum variation. On the profit and loss columns, a monetary result is shown. When the price touches the takeprofit barrier the expected utility would be \$2588.75. On the other hand the loss if the price reaches the Stop Loss barrier would be -\$2992.5.
fn.empiric_trade(val_2)
| Operation | Direction | Volume | Takeprofit(pip) | Stoploss(pip) | Profit($) | Loss($) | |
|---|---|---|---|---|---|---|---|
| Operation | Buy | 1.0 | 237500.0 | 109 | 126 | 2588.75 | -2992.5 |
The third validation will be done using the datadrame shown below:
val_3 = fn.validation(events,3)
For this specific scenario, the price tendency is on the upside. Implying that the trade is perform on the red dash axis, the logical results of the empirical trade, will be that the recommended to buy once the indicator has been reported because the direction seem to be positive.
vis.val_graph(val_3)
The empirical design of the strategy states that the operation to perform is to "buy" given that the direction is positive. For the Takeprofit pip variation will be on terms of the maximum varition meanwhile the Stoploss pip variation will be defined on terms of the minimum variation. On the profit and loss columns, a monetary result is shown. When the price touches the takeprofit barrier the expected utility would be \$7397.93. On the other hand the loss if the price reaches the Stop Loss barrier would be -\$5294.59.
fn.empiric_trade(val_3)
| Operation | Direction | Volume | Takeprofit(pip) | Stoploss(pip) | Profit($) | Loss($) | |
|---|---|---|---|---|---|---|---|
| Operation | Buy | 1.0 | 145057.5 | 510 | 365 | 7397.9325 | -5294.59875 |
The fourth validation will be done using the datadrame shown below:
val_4 = fn.validation(events,4)
Since the behaviour of the time series of the visualization is decreasing, once the indicator is reported, the advised operation would be to sell, based upon the fact that selling in the future will translate in losses contrary to closing the operation on the future.
vis.val_graph(val_4)
The empirical design of the strategy states that the operation to perform is to "sell" given that the direction is negative. For the Takeprofit pip variation will be on terms of the minimum varition meanwhile the Stoploss pip variation will be defined on terms of the maximun variation. On the profit and loss columns, a monetary result is shown. When the price touches the takeprofit barrier the expected utility would be \$1324.62. On the other hand the loss if the price reaches the Stop Loss barrier would be -\$24.30.
fn.empiric_trade(val_4)
| Operation | Direction | Volume | Takeprofit(pip) | Stoploss(pip) | Profit($) | Loss($) | |
|---|---|---|---|---|---|---|---|
| Operation | Sell | -1.0 | 121525.0 | 109 | 2 | 1324.6225 | -24.305 |
The fifth validation will be done using the datadrame shown below:
val_5 = fn.validation(events,5)
Even though there seem to be two operation at the same time the unemployment rate is announced with different prices on a decreasing trend, after this sprecific moment in time, there is a visible positive trend in the time series overall.
vis.val_graph(val_5)
The empirical design of the strategy states that the operation to perform is to "buy" given that the direction is positive. For the Takeprofit pip variation will be on terms of the maximum varition meanwhile the Stoploss pip variation will be defined on terms of the minimum variation. On the profit and loss columns, a monetary result is shown. When the price touches the takeprofit barrier the expected utility would be \$40503.30. On the other hand the loss if the price reaches the Stop Loss barrier would be -\$28987.66.
fn.empiric_trade(val_5)
| Operation | Direction | Volume | Takeprofit(pip) | Stoploss(pip) | Profit($) | Loss($) | |
|---|---|---|---|---|---|---|---|
| Operation | Buy | 1.0 | 794182.5 | 510 | 365 | 40503.3075 | -28987.66125 |
The scenarios presented on the validations, occur one after the other. In other words the validation for the second scenario in time term comes after the validation for the first scenario an so on. Given this explanation, the increment of the positions wont chance, it will be static.
Time series
### Let's see the time series evolution
vis.plot_ts(ts=unemployment,title='Unemployment rate',yaxes="Rate",xaxes='Time')
The Unemploymnet rate have a downside trend from 2018 to march of 2020, during the pandemic we can see an impact and a change of trend which it's upside specifically during april and may, despite the dominant downside trend that we can observe from jun to December there are still higher rates than the ones reported durign 2018 and 2019.
To test if our time series have a high degree of autocorrelation with the lagged version of the time series. Autocorrelation it's important to know if a time series past values have a relation with the future values. The test is Durbin Watson.
Statistic:
$$\frac{\sum_{t=2}^T((e_t - e_{t-1})^2)}{\sum_{t=1}^Te_t^2}$$Where:
$e$: Represent the residuals of the time series.
help(fn.dw)
Help on function dw in module functions:
dw(x, funcs=[<function acf at 0x000001B09E5C0CA0>, <function pacf at 0x000001B09E5C0E50>])
Docstring:
The porpouse of this function is to compute the durbin watson test for the time series residuals,
in order to know if there is autocorrelation present.
Also the PACF and the ACF plot is compute.
Parameters
-------------
x: time series
funcs: an array with the acf, pacf function.
Returns
-------------
returns three figures, which contain the result of the durbin watson test and the PACF and ACF plot.
References
-------------
https://www.statology.org/durbin-watson-test-python/
fn.dw(x=dt.unemployment['Actual '])
The Durbin Watson Statistic it's between 0 and less than two, therefore we can said there is positive autocorrelation in the unemployment rate time series, meaning that past values have effect in the outcome of future values.
Statistical test used to check if variances from two samples are equal (Homocedasticity).
Statistic:
$$ W = \frac{(N-k)\sum_{i=1}^{k}N_{i}(Z_i - Z..)^2}{(k-1)\sum_{i=1}^{N_i}(Z_{ij}-Z_i)^2} $$Where:
Null Hypothesis: Variances are equal (Homocedasticity).
Statistical test used to check if variances from two samples are equal (Homocedasticity).
Statistic:
$$ X^{2} = \frac{ln(S^{2}{p})-\sum{i=1}^{k}(n_i-1)ln(S^{2}{i})}{1+\frac{1}{3(k-1)}\Big(\sum{i=1}^{k}\frac{1}{n_i-1}-\frac{1}{N-k}\Big)} $$Where:
Null Hypothesis: Variances are equal (Homocedasticity).
help(fn.var_test)
Help on function var_test in module functions:
var_test(x, alfa: float)
Docstring
The porpouse of this function is to test if the time series have the heteroscedasticity property.
Parameters
-------------------
x: time series
alfa: 1- significance level
Returns
--------------------
A chart with the results of Levene and Bartlett statistical tests.
fn.var_test(x=dt.unemployment['Actual '],alfa=0.05)
After testing our time series with the Levene and Bartlett test both Null Hypothesis were rejected with an $\alpha$ of 0.05 (Homocedasticity property). Therefore the heterocedasticity property it's present in our time series meaning that variance is not constant for point in time.
List of the following tests:
It is a statistical test used to verified if a set of data follows a normal distribution. Published by Samuel Shapiro and Martin Bradbury in the 1960s.
Statistic:
$$ W = \frac{\Big(\sum_{i=1}^{n}\alpha_{i}x_{(i)}\Big)}{\sum_{i=1}^{n} (x_{i} - \bar{x})^2 } $$Where:
Null Hypothesis $H_0$: The sample follows a normal distribution.
Statistical test for normality. Based on kurtosis and *skewness**.
The sample skewness and kurtosis are defined as:
$$ g_1 = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^3}{\Big(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^{2}\Big)\frac{3}{2}} $$$$ g_2 = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^4}{\Big(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^{2}\Big)^{2}} $$After a transformation to $g_1$ and $g_2$ it's applied the statistic results:
$$ K^2 = Z_{1}(g_{1})^2 + Z_{2}(g_{2})^2 $$Where:
Null Hypothesis $H_0$: The sample follows a normal distribution.
Goodness of fit test, which use the kurtosis and the skewness from the sample, to test if the data follows a normal distribution.
Skewness & Kurtosis:
$$ S = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^3}{\Big(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^{2}\Big)\frac{3}{2}} $$$$ K = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^4}{\Big(\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^{2}\Big)^{2}} $$Jarque Bera Statistic:
$${JB = \frac{n}{6}\Big(S^{2} + \frac{1}{4}(K-3)^{2}\Big)}$$Where:
Null Hypothesis $H_0$: The sample follows a normal distribution.
help(fn.normality_test)
Help on function normality_test in module functions:
normality_test(x, alfa: float, funcs: list = [<function shapiro at 0x000001B09D050670>, <function normaltest at 0x000001B09CFFCB80>, <function jarque_bera at 0x000001B09CFFCF70>, <function anderson at 0x000001B09D050E50>])
Docstring
The porpouse of this function is to compute normalility test for a time series with the following statistical tests:
shapiro, d'angostino, jarque bera and anderson darling.
Parameters
------------
x: time series
alfa: 1-significance level
funcs: a predifined list with normal test functions.
Return
------------
A chart with the normalility test results.
References
--------------
https://plotly.com/python/v3/normality-test/
fn.normality_test(x=dt.unemployment['Actual '],alfa=0.05)
help(vis.hist)
Help on function hist in module visualizations:
hist(x, title: str, yaxes: str, xaxes: str)
Docstring
Function that plots the time series Histogram.
Parameters
--------------------
x: time series.
title: the title of the plot.
Returns
--------------------
boxplot of the time series.
After computing different statistical test we can conclude the our data don't have a normal distribution. Notice that the P-Value for the Anderson Darling test is bigger than the choosed alfa, although $H_0$ is rejected. The reason behind this is that if the statistic is less or equal than an $\alpha$ of 0.05 then $H_0$ it's rejected.
Also we can confirm that our time series lack of normality by looking at the Histogram and the QQ-plot.
vis.hist(x=dt.unemployment['Actual '],title='Unemployment Rate Histogram',yaxes="Count",xaxes="Rates")
help(fn.qq)
Help on function qq in module functions:
qq(x, qqplot_data)
Docstring
The porpouse of this function is to plot a qqplot.
Parameters
--------------
x: time series
qqplot_data: figure
Returns
--------------
A qqplot figure.
References
--------------
https://plotly.com/python/v3/normality-test/
qqplot_data=qqplot(dt.unemployment['Actual '], line='s').gca().lines
fn.qq(x=dt.unemployment['Actual '],qqplot_data=qqplot_data)
To check if our time series indicator have a seasonality component we would use the Kruskal Wallis Test, which is a non parametric test whit the porpouse of proving if n samples orignate from the same distribution.
Statistic: $${Q=\frac{SS_{t}}{SS_{e}}}$$
Where:
Null Hypothesis: all samples (in time series context months, quarters...) have the same mean. If rejectedthere is no seasonality.
help(fn.seasonality)
Help on function seasonality in module functions:
seasonality(x, alfa: float, m: int)
Docstring
The porpouse of this function is to check if the time series have a seasonal component,
wirh the kruskal wallis test.
Parameters
----------------
x: time series
alfa: 1-significance level
m: periods of the timeseries if monthly 12, quarter, etc....
Returns
----------------
A chart with the results.
Reference
----------------
https://knk00.medium.com/how-to-determine-seasonality-without-plots-f18cee913b95
fn.seasonality(x=dt.unemployment['Actual '],alfa=0.05,m=12)
The result of the Kruskal Wallis test reject the null Hypothesis with an $\alpha$ of 0.05 , therefore at least one sample have a different median from the rest, in a time series context this can be interpreteted as at least one sample stochastically dominates at least one other sample. So the seasonal component is present in the time series.
It is define as a stationary process when data have the following property: mean, variance and autocorrelation structure do not change over time.
Augmented Dickey-Fuller test Is the statistical test used to check if our time series indicator have this property. It is a unit root test, the intuiton behind the test is that it determine how strong is the trend component in a time series.
Statistic:
$${DF_{\tau}=\frac{\hat{\gamma}}{SE(\hat{\gamma})}}$$Where:
The null hypothesis of the test is that the time series can be represented by a unit root, that it is not stationary. The alternate hypothesis (rejecting the null hypothesis) is that the time series is stationary.
help(fn.stationarity)
Help on function stationarity in module functions:
stationarity(x, alfa: float)
Docstring
The porpouse of this function is to test if the the time series is stationary usign the Augmented Dickey Fuller test.
Parameters
-----------------
x: time series
alfa: 1-significance level
Returns
----------------
A chart with the results of the test.
References
----------------
https://machinelearningmastery.com/time-series-data-stationary-python/
fn.stationarity(x=dt.unemployment['Actual '],alfa=.05)
After computing the Aumented Dickey Fuller test pvalue results bigger than an $\alpha$ of 0.05 , therefore the Null Hypothesis can't be rejected and we can conclude that our time series is not stationarity.
This statement make sense with the output of the normal and heterocedasticity test because if variance was constant and the data distribute normal (normal data is stationary), then the time series would be stationary.
To check if our time series have outliers the IQR (Interquartile Rnage) criterion is used. The IQR is the difference between the third and the first quartil. To use this criterion data is divided in quarters, the IQR represent the middle half of the data that lies between the upper and lower quartiles.
$$ IQR = Q_3 - Q_1 $$An aupper limit is define, observations above this limit are consider outliers.
Upper limit = $Q_3 +1.5*iqr$
A lower limit is define, observations below this limit are consider outliers.
Lower limit = $Q_1 -1.5*iqr$
help(fn.iqr)
Help on function iqr in module functions:
iqr(x)
Docstring
The porpouse of this function is to check if a time series have outliers useing the
IQR criterion.
Parameters
------------------
x: time series
Returns
------------------
a dataframe with the values that are consider outliers.
References
------------------
https://www.statology.org/interquartile-range-python/
warnings.filterwarnings("ignore")
fn.iqr(x=dt.unemployment)
28 0.147 29 0.133 30 0.111 31 0.102 32 0.084 33 0.079 34 0.069 35 0.067 Name: Actual , dtype: float64
help(vis.boxplot)
Help on function boxplot in module visualizations:
boxplot(x, title: str, yaxes: str, xaxes: str)
Docstring
Function that plots the time series boxplot.
Parameters
--------------------
x: time series.
title: the title of the plot.
Returns
--------------------
boxplot of the time series.
vis.boxplot(x=dt.unemployment['Actual '],title='Unemploymnet Rate Boxplot',yaxes='Sample',xaxes="Rates")
The ouput of our function showed that we have eight atipical values in the time series indicator, which are out of the upper limit define in our criterion, they can be visualize in the boxplot. All of them occur during 2020 which characterized as a period with high unemployment rate. Maybe due to the pandemic crisis.
Each event is classified following the following rules that represents occurrence scenarios:
| Scenario | Rule |
|---|---|
| A | Actual $\geq$ Consensus $\geq$ Previous |
| B | Actual $\geq$ Consensus $<$ Previous |
| C | Actual $<$ Consensus $\leq$ Previous |
| D | Actual $<$ Consensus $<$ Previous |
To make this classification, a function is used, which consists of a for cycle that iterates the data frame that contains each scenario, that represents each monthly report of the indicator. Within this cycle, the percentage that was reported as current, consensus and prior is compared by multiple conditions, following the aforementioned rules and assigning them the corresponding scenario.
help(fn.Scenario_Clasification)
Help on function Scenario_Clasification in module functions:
Scenario_Clasification(indicator)
This function clasifies the indicator data according to pre-set scenerios
Parameters
----------
indicator : Dataframe
All three years worth of indicator data
data = fn.data_manipulation(usdmxn_18,usdmxn_19,usdmxn_20,unemployment)
usdmxn = data[0]
unemployment = data[1]
unemployment_df = fn.Scenario_Clasification(unemployment)
display(unemployment_df.head())
display(unemployment_df.tail())
| Country/Region | Event | Importance | Period | Actual | Consensus | Prior | Scenario | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2018-01-05 07:30:00 | United States | Unemployment Rate | High | DEC | 4.1% | 4.1% | 4.1% | A |
| 2018-02-02 07:30:00 | United States | Unemployment Rate | High | JAN | 4.1% | 4.1% | 4.1% | A |
| 2018-03-09 07:30:00 | United States | Unemployment Rate | High | FEB | 4.1% | 4.0% | 4.1% | B |
| 2018-04-06 07:30:00 | United States | Unemployment Rate | High | MAR | 4.1% | 4.0% | 4.1% | B |
| 2018-05-04 07:30:00 | United States | Unemployment Rate | High | APR | 3.9% | 4.0% | 4.1% | D |
| Country/Region | Event | Importance | Period | Actual | Consensus | Prior | Scenario | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2020-08-07 07:30:00 | United States | Unemployment Rate | High | JUL | 10.2% | 10.5% | 11.1% | D |
| 2020-09-04 07:30:00 | United States | Unemployment Rate | High | AUG | 8.4% | 9.8% | 10.2% | C |
| 2020-10-02 07:30:00 | United States | Unemployment Rate | High | SEP | 7.9% | 8.2% | 8.4% | D |
| 2020-11-06 07:30:00 | United States | Unemployment Rate | High | OCT | 6.9% | 7.7% | 7.9% | D |
| 2020-12-04 07:30:00 | United States | Unemployment Rate | High | NOV | 6.7% | 6.8% | 6.9% | D |
Through a function, a data frame is created for each event, which consists of the prices with one-minute intervals of 30 minutes before each moment in which the indicator's statement was made and 30 minutes later. Resulting 36 data frames.
The function creates the data frames through a cycle that connects two data frames, the one containing the events (each release) and a data frame of the currency usdmxn with prices per minute. The date of the release is searched for in the price data frame and the 30 minutes before and 30 minutes after are extracted using “datetime.timedelta”.
warnings.filterwarnings('ignore')
eventos_df = fn.Event_Data(data[0], unemployment)
eventos_df[0].head()
| index | timestamp | open | high | low | close | volume | date | |
|---|---|---|---|---|---|---|---|---|
| 0 | 3297 | 2018-01-05 07:00:00 | 23.041475 | 23.046785 | 23.041475 | 23.046785 | 520680.0 | 2018-01-05 |
| 1 | 3298 | 2018-01-05 07:01:00 | 23.046785 | 23.046785 | 23.046785 | 23.046785 | 238645.0 | 2018-01-05 |
| 2 | 3299 | 2018-01-05 07:04:00 | 23.041475 | 23.041475 | 23.041475 | 23.041475 | 217000.0 | 2018-01-05 |
| 3 | 3300 | 2018-01-05 07:06:00 | 23.046785 | 23.046785 | 23.046785 | 23.046785 | 21695.0 | 2018-01-05 |
| 4 | 3301 | 2018-01-05 07:08:00 | 23.046785 | 23.046785 | 23.046785 | 23.046785 | 151865.0 | 2018-01-05 |
The Metrics function defined in the functions.py file does all the calculations for the corresponding metrics:
It displays a dataframe for the metrics corresponding to everytime the unemployment rate is published. All the previously mentionate are reported on pips (1dll*10000).
This specific metrics shows tendency in a certain time series. Applied to the prices of future contracts, this metric is calculated by substaction of the closing price 30 minutes after the indicator has been announced and the opening price when the indicator is published. The sign is the only relevant part of this result. If it is positive (+1), indicates that the closing price was higher than the opening. In case of the opposite, the sign will have to be negative (-1).
To obstain this metric is necesary to obtain the maximum price of the high prices for the following thirty minutes once the indicator has been announced. Once this has been calculated, a substraction is performed between the maximum price and the opening price on t=0. This translates on a general idea of the highest pip variation presented on the analyzed period of time.
Opposite to the Bullish Pip, this metric can be calculated by, then again, a substration of the opening price once the indicator has been announced and the minimum of the low prices on a time window of thirty minutes after the publication of the indicator. It ilustrates the lowest pip variation presented during the analyzed time period.
This metric represents on a statistical way the prices dispersion. To calculate this metric, it is required to obtain the maximum value of the high prices and substract the minimimum value of the low prices.
help(fn.Metrics)
Help on function Metrics in module functions:
Metrics(eventos_df, unemployment_df)
This function returns a consolidated dataframe for metrics like direction, bullish pip, bear pip and volatility,
Parameters
----------
eventos_df : Dataframe
Dataframe that contains trading informartion related to the chosen currency for each indicator event.
unemployment : Dataframe
Dataframe that contains indicator informartion.
df_escenarios= fn.Metrics(eventos_df, unemployment_df)
display(df_escenarios.head())
display(df_escenarios.tail())
| Scenario | Direction | Bullish_Pip | Bear_Pip | Volatility | |
|---|---|---|---|---|---|
| Datetime | |||||
| 2018-01-05 07:30:00 | A | 1 | 802.085679 | 0.000000 | 1227.597677 |
| 2018-02-02 07:30:00 | A | 1 | 519.736522 | 0.000000 | 613.967477 |
| 2018-03-09 07:30:00 | B | 1 | 142.426418 | 236.964174 | 379.390592 |
| 2018-04-06 07:30:00 | B | 1 | 134.916728 | 179.622243 | 359.549382 |
| 2018-05-04 07:30:00 | D | -1 | 0.000000 | 396.470461 | 694.752120 |
| Scenario | Direction | Bullish_Pip | Bear_Pip | Volatility | |
|---|---|---|---|---|---|
| Datetime | |||||
| 2020-08-07 07:30:00 | D | 1 | 159.383039 | 318.105955 | 690.343025 |
| 2020-09-04 07:30:00 | C | 1 | 146.095666 | 48.655592 | 534.740381 |
| 2020-10-02 07:30:00 | D | -1 | 0.000000 | 448.722442 | 949.425028 |
| 2020-11-06 07:30:00 | D | 1 | 712.439193 | 177.360020 | 1287.769388 |
| 2020-12-04 07:30:00 | D | 1 | 280.056146 | 39.944059 | 479.808748 |
Using the training data, the function "fn.decisions" creates a dataframe with the strategy to place orders acording to the scenario.
help(fn.decisions)
Help on function decisions in module functions:
decisions(df_escenarios, usdmxn)
Function that creates a dataframe with the desingned
strategy to place orders according to the scenario.
Parameters
----------
df_escenarios:dataframe
'Datetime': timestamp, date of the indicator
'Scenario': A, B, C or D
'Direction': -1 if close price < open, 1 if close price > open
'Bullish_Pip': diference between the highest price (t_0:t_30) and the open price t_0
'Bear_Pip': diference between the open price t_0 and the lowest price (t_0:t_30)
'Volatility': diference between the highest price and the lowest
usdmxn:dataframe of the prices of the currency
Returns
-------
df_de: dataframe
dataframe with the following information
'Scenario': A, B, C or D
'Operation': Sell or Buy
'SL': stop loss
'TP': take profit
'Volume': optimal volume
df_decisions = fn.decisions(df_escenarios, usdmxn)
df_decisions
| Scenario | Operation | SL | TP | Volume | |
|---|---|---|---|---|---|
| 0 | A | Sell | 353.0 | 228.0 | 729960.0 |
| 1 | B | Sell | 188.0 | 256.0 | 47170.0 |
| 2 | C | Buy | 228.0 | 919.0 | 1894200.0 |
| 3 | D | Buy | 407.0 | 188.0 | 1637755.0 |
In order to run and see the performance associated to the previous trading strategy we'll be running a Back Test using the above configuration into our trading decisions, with this we expect to know how far we are from a good performance of our algorithm.
To work with this and others configurations we set a training and test set, for both of them we hace different periods but the show escencially the same data structure form intraday prices reporting for USDMXN. Also we need to set that configuration to our indicator reporting dates, because as we previously said this is the main factor to make a trading decision.
### Let's see our training set
display(dt.training_usdmxn.head())
display(dt.training_usdmxn.tail())
| timestamp | open | high | low | close | volume | date | |
|---|---|---|---|---|---|---|---|
| 0 | 2018-01-01 18:00:00 | 23.590469 | 23.596036 | 23.590469 | 23.596036 | 1292590.0 | 2018-01-01 |
| 1 | 2018-01-01 18:01:00 | 23.590469 | 23.590469 | 23.590469 | 23.590469 | 466290.0 | 2018-01-01 |
| 2 | 2018-01-01 18:05:00 | 23.590469 | 23.590469 | 23.590469 | 23.590469 | 423900.0 | 2018-01-01 |
| 3 | 2018-01-01 18:07:00 | 23.584906 | 23.590469 | 23.584906 | 23.590469 | 254340.0 | 2018-01-01 |
| 4 | 2018-01-01 18:08:00 | 23.590469 | 23.596036 | 23.590469 | 23.596036 | 423800.0 | 2018-01-01 |
| timestamp | open | high | low | close | volume | date | |
|---|---|---|---|---|---|---|---|
| 248982 | 2019-01-01 23:19:00 | 21.978022 | 21.978022 | 21.978022 | 21.978022 | 45500.0 | 2019-01-01 |
| 248983 | 2019-01-01 23:20:00 | 21.978022 | 21.978022 | 21.978022 | 21.978022 | 45500.0 | 2019-01-01 |
| 248984 | 2019-01-01 23:24:00 | 21.973193 | 21.973193 | 21.973193 | 21.973193 | 22755.0 | 2019-01-01 |
| 248985 | 2019-01-01 23:38:00 | 21.978022 | 21.978022 | 21.978022 | 21.978022 | 45500.0 | 2019-01-01 |
| 248986 | 2019-01-01 23:46:00 | 21.978022 | 21.978022 | 21.978022 | 21.978022 | 68250.0 | 2019-01-01 |
### Now let's display our test data set
display(dt.test_usdmxn.head())
display(dt.test_usdmxn.tail())
| timestamp | open | high | low | close | volume | date | |
|---|---|---|---|---|---|---|---|
| 248987 | 2019-01-02 00:07:00 | 21.978022 | 21.978022 | 21.978022 | 21.978022 | 45500.0 | 2019-01-02 |
| 248988 | 2019-01-02 00:08:00 | 21.978022 | 21.978022 | 21.978022 | 21.978022 | 182000.0 | 2019-01-02 |
| 248989 | 2019-01-02 00:14:00 | 21.982853 | 21.982853 | 21.982853 | 21.982853 | 68235.0 | 2019-01-02 |
| 248990 | 2019-01-02 00:16:00 | 21.982853 | 21.997360 | 21.982853 | 21.997360 | 2568490.0 | 2019-01-02 |
| 248991 | 2019-01-02 00:17:00 | 21.992523 | 21.992523 | 21.992523 | 21.992523 | 68205.0 | 2019-01-02 |
| timestamp | open | high | low | close | volume | date | |
|---|---|---|---|---|---|---|---|
| 472923 | 2020-01-02 23:38:00 | 19.884669 | 19.884669 | 19.884669 | 19.884669 | 125725.0 | 2020-01-02 |
| 472924 | 2020-01-02 23:41:00 | 19.884669 | 19.884669 | 19.884669 | 19.884669 | 477755.0 | 2020-01-02 |
| 472925 | 2020-01-02 23:49:00 | 19.888624 | 19.888624 | 19.888624 | 19.888624 | 25140.0 | 2020-01-02 |
| 472926 | 2020-01-02 23:51:00 | 19.888624 | 19.888624 | 19.888624 | 19.888624 | 75420.0 | 2020-01-02 |
| 472927 | 2020-01-02 23:59:00 | 19.892580 | 19.892580 | 19.892580 | 19.892580 | 1080805.0 | 2020-01-02 |
### Let's see scenario clasification for training period
clasification = fn.Scenario_Clasification(dt.unemployment)
clasification_train = clasification[(clasification.index>=pd.to_datetime('2018-01-01')) &
(clasification.index<=pd.to_datetime('2019-01-01'))]
display(clasification_train.head())
display(clasification_train.tail())
| Country/Region | Event | Importance | Period | Actual | Consensus | Prior | Scenario | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2018-01-05 07:30:00 | United States | Unemployment Rate | High | DEC | 0.041 | 0.041 | 0.041 | A |
| 2018-02-02 07:30:00 | United States | Unemployment Rate | High | JAN | 0.041 | 0.041 | 0.041 | A |
| 2018-03-09 07:30:00 | United States | Unemployment Rate | High | FEB | 0.041 | 0.040 | 0.041 | B |
| 2018-04-06 07:30:00 | United States | Unemployment Rate | High | MAR | 0.041 | 0.040 | 0.041 | B |
| 2018-05-04 07:30:00 | United States | Unemployment Rate | High | APR | 0.039 | 0.040 | 0.041 | D |
| Country/Region | Event | Importance | Period | Actual | Consensus | Prior | Scenario | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2018-08-03 07:30:00 | United States | Unemployment Rate | High | JUL | 0.039 | 0.039 | 0.040 | B |
| 2018-09-07 07:30:00 | United States | Unemployment Rate | High | AUG | 0.039 | 0.038 | 0.039 | B |
| 2018-10-05 07:30:00 | United States | Unemployment Rate | High | SEP | 0.037 | 0.038 | 0.039 | D |
| 2018-11-02 06:30:00 | United States | Unemployment Rate | High | OCT | 0.037 | 0.037 | 0.037 | A |
| 2018-12-07 07:30:00 | United States | Unemployment Rate | High | NOV | 0.037 | 0.037 | 0.038 | B |
### Now let's see the clasification for test period
clasification_test = clasification[(clasification.index>=pd.to_datetime('2019-01-02')) &
(clasification.index<=pd.to_datetime('2020-01-02'))]
display(clasification_test.head())
display(clasification_test.tail())
| Country/Region | Event | Importance | Period | Actual | Consensus | Prior | Scenario | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2019-01-04 07:30:00 | United States | Unemployment Rate | High | DEC | 0.039 | 0.037 | 0.037 | A |
| 2019-02-01 07:30:00 | United States | Unemployment Rate | High | JAN | 0.040 | 0.039 | 0.039 | A |
| 2019-03-08 07:30:00 | United States | Unemployment Rate | High | FEB | 0.038 | 0.039 | 0.040 | D |
| 2019-04-05 06:30:00 | United States | Unemployment Rate | High | MAR | 0.038 | 0.038 | 0.038 | A |
| 2019-05-03 07:30:00 | United States | Unemployment Rate | High | APR | 0.036 | 0.038 | 0.038 | C |
| Country/Region | Event | Importance | Period | Actual | Consensus | Prior | Scenario | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2019-08-02 07:30:00 | United States | Unemployment Rate | High | JUL | 0.037 | 0.037 | 0.037 | A |
| 2019-09-06 07:30:00 | United States | Unemployment Rate | High | AUG | 0.037 | 0.037 | 0.037 | A |
| 2019-10-04 07:30:00 | United States | Unemployment Rate | High | SEP | 0.035 | 0.037 | 0.037 | C |
| 2019-11-01 06:30:00 | United States | Unemployment Rate | High | OCT | 0.036 | 0.036 | 0.035 | A |
| 2019-12-06 07:30:00 | United States | Unemployment Rate | High | NOV | 0.035 | 0.036 | 0.036 | C |
With all the information required we proceed to run our back test in order to understand what's happening with the strategy.
df_decisions
| Scenario | Operation | SL | TP | Volume | |
|---|---|---|---|---|---|
| 0 | A | Sell | 353.0 | 228.0 | 729960.0 |
| 1 | B | Sell | 188.0 | 256.0 | 47170.0 |
| 2 | C | Buy | 228.0 | 919.0 | 1894200.0 |
| 3 | D | Buy | 407.0 | 188.0 | 1637755.0 |
### Back test for A scenario. (Sell order)
initial_cap = 100000
fn.get_trading_summary(dt.training_usdmxn, clasification_train[clasification_train['Scenario']=='A'], df_decisions['SL'][0],
df_decisions['TP'][0], df_decisions['Volume'][0], initial_cap, 'A')
| Scenario | Operation | Volume | Result | Pip Up | Pip Down | Capital | Cumulative Capital | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2018-07-06 07:30:00 | A | Sell | 729960.0 | Won | 353.0 | 228.0 | 807.656561 | 98524.713247 |
| 2018-11-02 06:30:00 | A | Sell | 729960.0 | Won | 353.0 | 228.0 | 988.659142 | 99513.372389 |
### Back test for B scenario. (Sell order)
fn.get_trading_summary(dt.training_usdmxn, clasification_train[clasification_train['Scenario']=='B'], df_decisions['SL'][1],
df_decisions['TP'][1], df_decisions['Volume'][1], initial_cap, 'B')
| Scenario | Operation | Volume | Result | Pip Up | Pip Down | Capital | Cumulative Capital | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2018-03-09 07:30:00 | B | Sell | 47170.0 | Lost | 188.0 | 256.0 | -41.115711 | 99958.884289 |
| 2018-04-06 07:30:00 | B | Sell | 47170.0 | Won | 188.0 | 256.0 | 59.911092 | 100018.795381 |
| 2018-08-03 07:30:00 | B | Sell | 47170.0 | Lost | 188.0 | 256.0 | -60.345416 | 99958.449965 |
| 2018-09-07 07:30:00 | B | Sell | 47170.0 | Lost | 188.0 | 256.0 | -41.214504 | 99917.235461 |
| 2018-12-07 07:30:00 | B | Sell | 47170.0 | Won | 188.0 | 256.0 | 97.682927 | 100014.918388 |
### Back test for C scenario. (Buy order)
fn.get_trading_summary(dt.training_usdmxn, clasification_train[clasification_train['Scenario']=='C'], df_decisions['TP'][2],
df_decisions['SL'][2], df_decisions['Volume'][2], initial_cap, 'C')
| Scenario | Operation | Volume | Result | Pip Up | Pip Down | Capital | Cumulative Capital | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2018-06-01 07:30:00 | C | Buy | 1894200.0 | Won | 919.0 | 228.0 | 7509.654851 | 107509.654851 |
### Back test for D scenario. (Buy order)
fn.get_trading_summary(dt.training_usdmxn, clasification_train[clasification_train['Scenario']=='D'], df_decisions['TP'][3],
df_decisions['SL'][3], df_decisions['Volume'][3], initial_cap, 'D')
| Scenario | Operation | Volume | Result | Pip Up | Pip Down | Capital | Cumulative Capital | |
|---|---|---|---|---|---|---|---|---|
| Datetime |
From the back test result we can see we have a positive balance, meaning we have developed a simple but effective trading strategy.
### For A scenario let's see the limits
limit_low_bull_a = fn.limits(df_escenarios)[0][0]
limit_high_bull_a = fn.limits(df_escenarios)[0][1]
limit_low_bear_a = fn.limits(df_escenarios)[0][2]
limit_high_bear_a = fn.limits(df_escenarios)[0][3]
limit_low_bull_b = fn.limits(df_escenarios)[1][0]
limit_high_bull_b = fn.limits(df_escenarios)[1][1]
limit_low_bear_b = fn.limits(df_escenarios)[1][2]
limit_high_bear_b = fn.limits(df_escenarios)[1][3]
limit_low_bull_c = fn.limits(df_escenarios)[2][0]
limit_high_bull_c = fn.limits(df_escenarios)[2][1]
limit_low_bear_c = fn.limits(df_escenarios)[2][2]
limit_high_bear_c = fn.limits(df_escenarios)[2][3]
limit_low_bull_d = fn.limits(df_escenarios)[3][0]
limit_high_bull_d = fn.limits(df_escenarios)[3][1]
limit_low_bear_d = fn.limits(df_escenarios)[3][2]
limit_high_bear_d = fn.limits(df_escenarios)[3][3]
As we've been saying across the notebook, our trading system it's based on the reporting value for US unemployment rate, in that sense we have four scenarios
A: Here we have a clearly increment on the report rate (in this scenario the report indicator suprasses not only previous values but also it's higher than the economical expectations).
B: For this scenario we have a defined situation, the report indicator it has to be greater or equal to the economical expectations, but even on this defined scenario we can see at least two situations. One where the actual value suprasses the previous or the previous been greater than the current data.
C: For this scenario we have a defined situation, the report indicator it has to be less than the economical expectations, but even on this defined scenario we can see at least two situations. One where the actual value suprasses the previous or the previous been greater than the current data.
D: The last event correspond to a clearly decrease for the unemployment rate, it's a linear decreasing where the economical expectations are below the previous value and the report one it's even less from the expectations.
Having a defined possible scenarios now we need to set a trading strategy with this information. This means the signal to make an specific operation.
We define a single movement for each time the indicator it's report, that operation should be closed when one of the limits defined are touched:
Now a natural question it would be, what happen if the prices movements never touches the limits of Takeprofit or Stoploss?. To answer that the algorithm designed closes the operation just right before another reporting for the indicator it's publicated, with this in consideration the operation it's closed with the balance to that timestamp
In order to understand better the trading system definition we'll display the docstring corresponding to the trading system.
### Let's see the docstring for the trading system
help(fn.get_trading_summary)
Help on function get_trading_summary in module functions:
get_trading_summary(data: pandas.core.frame.DataFrame, clasification: pandas.core.frame.DataFrame, pip_up: float, pip_down: float, volume: float, intial_cap: float, scenario=None) -> pandas.core.frame.DataFrame
Trading system definition based on the unemployment rate from the USA economy.
The decisions are set from a previous knowledge on the behaviour and relationship
between the USDMXN price and the reported value for the indicator. It summarize the
trading operation and the capital evolution through the life of the operations
Parameters
----------
data: pd.DataFrame (default:None) --> Required parameter
USDMXN prices on a minute granularity, it has to follow the next structure
'timestamp': First column, correspond to the timestamp associated to each price
'open': Second column, correspond to the open price for each timestamp associated
'high': Third column, correspond to the high price for each timestamp associated
'low': Fourth column, correspond to the low price for each timestamp associated
'close': Fifth column, correspond to the close price for each timestamp associated
'volume': Sixth column, correspond to the volume operated for each timestamp associated
'date': Seventh column, correspond to the date in YYYY-MM-DD format for each timestamp
clasification: pd.DataFrame (default:None) --> Required parameter
USA unemployment rate reporting from Jan-2018 to Dec-2020 (monthly frequency)
'Datetime': DataFrame index, correspond to the timestamp where the indicator was reported
'Country/Region ': Region of origin (unique value "United States")
'Event ': Indicator name
'Importance ': Level of importance associated to the indicator within the USA economy
'Period ': Reported period
'Actual ': Reported value for the indicator
'Consensus ': Economical expectations to the indicator report value
'Prior ': Previous value corresponding to the indicator (previous month)
'Scenario': Type of scenario definition
pip_up: float (default:None) --> Required parameter
Number of Pip's that will define an increase on USDMXN prices
pip_down: float (default:None) --> Required parameter
Number of Pip's that will define a downgrade on USDMXN prices
volume: float (default:None) --> Required parameter
Number of USDMXN to be trade by each operation
intial_cap: float (default:None) --> Required parameter
Initial capital for start the trading system (in USD)
scenario: str (default:None) --> Optional parameter
Scenario where the trading system wants to be analyzed. If none it will display all scenarios
Returns
-------
trading_res: pd.DataFrame
Final summary associated to the trading strategy it can correspond just to a single scenario or
all of them contained in clasification data frame. It follows the next structure
'Datetime': Index, timestamp where the indicator was reported
'Scenario': Scenario associated to the trading decision within that timestamp
'Operation': Signal detection (buy or sell)
'Volume': Sell or buy volume associated to the trading decision
'Result': Balance of the operation (won or lost)
'Pip Up': The upper pip defined for that trading strategy
'Pip Down': The lower pip defined for that trading strategy
'Capital': Utility assigned to the operation
'Cumulative Capital': Evolution of the invested capital within the whole period
References
----------
[1] https://pandas.pydata.org/docs/
As we said the more "important" parameters that define the trading system performance are delimited by a Takeprofit, Stoploss and Volume. This parameters changes acording to every scenario we can face, if we remember:
In order to achieve the optimal result we implemented a Particle Swarm Optimization (PSO), an heuristic algorithm used to minimize costs by finding the optimal parameter combination. With this is consideration we need to define a objective function to be minimized.
We're working with trading balances, so in order to get best risk-return relationship we can´t work or minimize the losses for the returns, we need to use a function that assigns that utility to a risk exposure, we'll use Sharpe Ratio:
The Sharpe Ratio is commonly used for comparing return versus the risk. The formula for this MAD is presented below:
$$\text{Sharpe Ratio} = \frac{R-R_{f}}{\sigma}$$Where:
To generate a significant profit, we must trade high volumes of the asset.
The results for the previously described process are presented in the following sells.
### Let's see the sharpe ratio calculator
help(fn.max_sharpe)
%%time
### Let's maximaze the parametrs for A scenario
down_limits_a = [limit_low_bull_a, limit_low_bear_a, 30000]
upper_limits_a = [limit_high_bull_a, limit_high_bear_a, 60000]
rf = 0.02237 # Risk free rate associated to the period to analyze
xopt_a, fopt_a = fn.get_pso(fn.max_sharpe, down_limits_a, upper_limits_a, (dt.training_usdmxn, clasification_train, 100000,
'A', rf), 50, 0.0005)
print(f'The optimal values for this configuration are:{xopt_a.tolist()}')
Stopping search: maximum iterations reached --> 50 The optimal values for this configuration are:[1070.5282762647882, 56.43038703386708, 60000.0] CPU times: total: 4min 50s Wall time: 4min 54s
%%time
### Let's maximaze the parametrs for B scenario
down_limits_b = [limit_low_bull_b, limit_low_bear_b, 30000]
upper_limits_b = [limit_high_bull_b, limit_high_bear_b, 60000]
xopt_b, fopt_b = fn.get_pso(fn.max_sharpe, down_limits_b, upper_limits_b, (dt.training_usdmxn, clasification_train, 100000,
'B', rf), 50, 0.0005)
print(f'The optimal values for this configuration are:{xopt_b.tolist()}')
Stopping search: maximum iterations reached --> 50 The optimal values for this configuration are:[316.7851313104189, 183.91274692772677, 60000.0] CPU times: total: 5min 32s Wall time: 5min 34s
%%time
### Let's maximaze the parametrs for C scenario
down_limits_c = [limit_low_bear_c, limit_low_bull_c, 5000]
upper_limits_c = [limit_high_bear_c, limit_high_bull_c, 10000]
xopt_c, fopt_c = fn.get_pso(fn.max_sharpe, down_limits_c, upper_limits_c, (dt.training_usdmxn, clasification_train, 100000,
'C', rf), 50, 0.0005)
print(f'The optimal values for this configuration are:{xopt_c.tolist()}')
Stopping search: maximum iterations reached --> 50 The optimal values for this configuration are:[245.32141590528428, 1057.7223455274705, 7129.314231332932] CPU times: total: 1min 56s Wall time: 1min 57s
%%time
### Let's maximaze the parametrs for D scenario
down_limits_d = [limit_low_bear_d, limit_low_bull_d, 5000]
upper_limits_d = [limit_high_bear_d, limit_high_bull_d, 10000]
xopt_d, fopt_d = fn.get_pso(fn.max_sharpe, down_limits_d, upper_limits_d, (dt.training_usdmxn, clasification_train, 100000,
'D', rf), 50, 0.0005)
print(f'The optimal values for this configuration are:{xopt_d.tolist()}')
Stopping search: maximum iterations reached --> 50 The optimal values for this configuration are:[448.7224423344216, 712.4391927582963, 10000.0] CPU times: total: 3min 39s Wall time: 3min 40s
### Let's validate the trading strategy for the optimal A scenario
a_test = fn.get_trading_summary(dt.test_usdmxn, clasification_test, xopt_a[0], xopt_a[1], xopt_a[2], 100000, 'A')
a_test
| Scenario | Operation | Volume | Result | Pip Up | Pip Down | Capital | Cumulative Capital | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2019-01-04 07:30:00 | A | Sell | 60000.0 | Won | 1070.528276 | 56.430387 | 26.275454 | 100026.275454 |
| 2019-02-01 07:30:00 | A | Sell | 60000.0 | Won | 1070.528276 | 56.430387 | 63.640221 | 100089.915675 |
| 2019-04-05 06:30:00 | A | Sell | 60000.0 | Won | 1070.528276 | 56.430387 | 25.252525 | 100115.168200 |
| 2019-06-07 07:30:00 | A | Sell | 60000.0 | Won | 1070.528276 | 56.430387 | 25.856496 | 100141.024697 |
| 2019-07-05 07:30:00 | A | Sell | 60000.0 | Won | 1070.528276 | 56.430387 | 24.691358 | 100165.716055 |
| 2019-08-02 07:30:00 | A | Sell | 60000.0 | Won | 1070.528276 | 56.430387 | 49.885679 | 100215.601733 |
| 2019-09-06 07:30:00 | A | Sell | 60000.0 | Won | 1070.528276 | 56.430387 | 25.316456 | 100240.918189 |
| 2019-11-01 06:30:00 | A | Sell | 60000.0 | Won | 1070.528276 | 56.430387 | 24.479804 | 100265.397993 |
### Let's validate the trading strategy for the optimal B scenario
b_test = fn.get_trading_summary(dt.test_usdmxn, clasification_test, xopt_b[0], xopt_b[1], xopt_b[2], 100000, 'B')
b_test
| Scenario | Operation | Volume | Result | Pip Up | Pip Down | Capital | Cumulative Capital | |
|---|---|---|---|---|---|---|---|---|
| Datetime |
### Let's validate the trading strategy for the optimal C scenario
c_test = fn.get_trading_summary(dt.test_usdmxn, clasification_test, xopt_c[0], xopt_c[1], xopt_c[2], 100000, 'C')
c_test
| Scenario | Operation | Volume | Result | Pip Up | Pip Down | Capital | Cumulative Capital | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2019-05-03 07:30:00 | C | Buy | 7129.314231 | Lost | 245.321416 | 1057.722346 | -37.201598 | 99962.798402 |
| 2019-10-04 07:30:00 | C | Buy | 7129.314231 | Won | 245.321416 | 1057.722346 | 8.963932 | 99971.762334 |
| 2019-12-06 07:30:00 | C | Buy | 7129.314231 | Lost | 245.321416 | 1057.722346 | -37.713565 | 99934.048770 |
### Let's validate the trading strategy for the optimal D scenario
d_test = fn.get_trading_summary(dt.test_usdmxn, clasification_test, xopt_d[0], xopt_d[1], xopt_d[2], 100000, 'D')
d_test
| Scenario | Operation | Volume | Result | Pip Up | Pip Down | Capital | Cumulative Capital | |
|---|---|---|---|---|---|---|---|---|
| Datetime | ||||||||
| 2019-03-08 07:30:00 | D | Buy | 10000.0 | Lost | 448.722442 | 712.439193 | -34.572169 | 99965.427831 |
### Let's see the final balance for the whole trading strategy
final_balance = pd.concat([a_test['Capital'], b_test['Capital'],
c_test['Capital'], d_test['Capital']]).sum()
print(f'The final balance for the trading strategy is of : {round(final_balance, 2)} USD')
The final balance for the trading strategy is of : 164.87 USD
Now, using the optimal paramaters got by the PSO optimization, we got a final balance for the trading strategy corresponding to $\$164.87$ USD, as we can see the scenario where we get to lose more money are defined by the Buy strategies (C and D) scenarios. This means that in general the expected behaviour for our currency in this cases were not accurate, this doesn´t mean that our trading system it's completely wrong there are a lot of factor that influence this result, starting for the period, we can´t guarantee a behaviour for the test set to be replicate as the train set, furthermore there is the parameters definition, maybe the spectre analyzed wasn´t the best for Takeprofit, Stoploss and Volume.
As a recommendation we would explore other ranges for this parameters to be test in order to get a better behaviour modelling for the USDMXN asset.
The knowledge acquired in the Microstructures and Trading Systems class was reflected throughout this final project. During all the stages of the project, research skills, analysis, critical thinking, and both empirical and scientific knowledge were worked on.
Starting from the selection of the indicator, the fact that the unemployment rate is taken means a significant sample of the strength of an economy, since this indicator is strongly linked to national productivity, so the changes that occur in it will have a significative impact on the explored currency, also considering that the economies that are involved in the currency pair are strongly connected.
Starting from the first section of the project, the exploration of the databases helps to contextualize the relationship that exists between the indicator and the currency. Empirical validations are used to identify trends that helped define expectations about the next stages of the project.
In the statistical part, the behavior of the time series of the indicator is thoroughly known, this is extremely important because it is from the mastery of this information that the trading system begins to take shape. Applying different statistical tests, we could find the time series statistical properties, among the most important we found out that: the sample don’t follow a normal distribution, variance is not constant through time, is not stationary, seasonal and autocorrelation component it’s present finally atypical values were found, which happened during 2020.
Carrying out the classification of scenarios for the financial aspects section, it is possible to recognize patterns based on logical expectations and the historical information of the indicator and obtain metrics that gave rise to the backtest and optimization phase.
At the time of defining the trading system, we were able to identify and define a search space for the parameters on which the optimization would be performed, parameters involve in the optimization are the volume, stoploss and takeprofit . It is important to emphasize that this system could be optimized for the utility function, but it becomes more sophisticated using a performance attribution measure, we implemented Sharpe’s Ratio and Sortino’s Ratio.
Developing a trading strategy found in fundamental analysis with data granularity that goes up to minutes gave us a great perspective of the profit opportunities that can be achieve in the market and appreciate market dynamics driven by a macroeconomic indicator.
The algorithm used for optimization was PSO as it maximized the profitability of the trading system efficiently as we ended up with a positive balance in the end.
It should be noted that the fact that the trading system generates profits shows that the application of theoretical, financial and computational concepts were sufficient to meet the objective of the project, to create a trading system based on the impact that the unemployment rate has on USDMXN. Although t’s important to consider that our trading system is based only in one economic indicator, therefore the assumption that with one indicator we can be able to predict with high accuracy the direction of the currency based on past data it’s unrealistic.
[1] Munnoz, 2020. Python project template. https://github.com/iffranciscome/python-project. (2021).